This is an example pipeline to run flow of two tumor/normal pairs through the hg19 pipeline for JaBbA. There are a few steps to setting up and running these jobs. After getting through the pipeline we we run a job post JaBbA creation and work towards visualizing some of the plots.
The pipeline that we will run through is covered here:
Here we align fastq files into bams for tumor/normal paired samples for WGS. Next, we run a series of jobs that are required inputs for the JaBbA pipeline. However, since the alignment of WGS samples can take upwards of a week, we will start off with a couple aligned hg19 BAM files and begin to work from there.
This is our pairs table in the purest form, where we have bam files for the associated tumor and normal sample. We will run the samples through the pipeline:
pairs = readRDS("db/pairs.rds")
pairs
## pair
## 40 25008780
## 49 25023660
## normal_bam
## 40 /gpfs/commons/groups/imielinski_lab/projects/DavisBRCA/Flow/ChaoJul2022/BWAMem_Fast/Sample_25008780-151/Sample_25008780-151.bam
## 49 /gpfs/commons/groups/imielinski_lab/projects/DavisBRCA/Flow/ChaoJul2022/BWAMem_Fast/Sample_25023660-549/Sample_25023660-549.bam
## tumor_bam
## 40 /gpfs/commons/groups/imielinski_lab/projects/DavisBRCA/Flow/ChaoJul2022/BWAMem_Fast/Sample_25008780-DNAt/Sample_25008780-DNAt.bam
## 49 /gpfs/commons/groups/imielinski_lab/projects/DavisBRCA/Flow/ChaoJul2022/BWAMem_Fast/Sample_25023660DNAt/Sample_25023660DNA.bam
There is intentionally something wrong with one of these samples: try and figure this out!
Once we have our generated bams we are going to run a series of jobs on them to generate the prerequisites for running JaBbA. There are a series of jobs that will need to done on the bams and they can all be ran in tandem. So let’s try and get one of these going.
The first one is the hetpileup caller, which generates a pile up of our allele fractions for a sample.
cat ~/tasks/hg19/core_dryclean/HetPileups.task
## # HetPileupsWGS urn:lsid:broadinstitute.org:cancer.genome.analysis:10898:6
## ~/modules/Pileup
## input TumorBam tumor_bam_wgs path
## input KeepOnlyHets "TRUE" value
## input NormalBam normal_bam_wgs path
## input NumCores cores value "4"
## input MaxDepth max_depth value "1000" ## default is 500 (from bamUtils::mafcount > bamUtils::varcount
## input SitesVCF hapmap.sites path "/gpfs/commons/groups/imielinski_lab/DB/references/hg19/hapmap_3.3.b37.vcf.gz"
## output het_pileups_wgs sites.txt$
Here we see a few arguments that will need to be inputed, and there are a couple columns in our pairs table that may need to be renamed. The task text table format is something like this:
| col1 | col2 | col3 | col4 | col5 |
|---|---|---|---|---|
| input or output | value name inside the Job class | input label/colname of the file | class expected of input | any default argument |
For more information on how to set up jobs, the wiki on Flow is a good source to follow.
These jobs/tasks that need to be run on the bams and can all be done in tandem. Expect if all things go well with all the jobs that it should be done within a few hours.
Here’s a table of the BAM jobs to run:
| tool | task location | description |
|---|---|---|
| Svaba | ~/tasks/hg19/core_dryclean/Svaba.task |
SV caller |
| Strelka | ~/tasks/hg19/core_dryclean/Strelka2 |
SNV caller |
| fragcounter (on tumor) | ~/tasks/hg19/core_dryclean/fragCounter.task |
fragment counting across the genome with GC and mappability corrections |
| fragcounter (on normal) | ~/tasks/hg19/core_dryclean/fragCounter.normal.task |
same as above but for normal |
| HetPileup | ~/tasks/hg19/core_dryclean/HetPileups.task |
Get allele fractions for het sites across genome |
After all of these steps are ran and processed, remember to merge their outputs together and save this into our pairs table.
Alternatively, one can also use GRIDSS/GRIPSS as a SV and junction breakpoint detector.
If all things are working well, we can begin processing some of the bam outputs in preparation for JaBbA. One of these is dryclean, which further processes the fragcounter outputs to further remove noise and signal via rPCA using a PON.
Dryclean our fragcounter outputs for both tumor and normal.
The tasks associated with dry clean:
dryclean normal: ~/tasks/hg19/core_dryclean/Dryclean.normal.task
dryclean tumor: ~/tasks/hg19/core_dryclean/Dryclean.task
Then we use CBS to get our segments ready for JaBbA. The task expects the fragcounter outputs, so we’ll need to use a temporary pairs table to edit our colnames for our dryclean outputs to match what CBS task expects.
After this step is done, run CovCBS: ~/tasks/hg19/core_dryclean/CBS.task
MERGE ALL OF THIS INTO YOUR PAIRS TABLE BEFORE PROCEEDING.
Ascat purity/ploidy estimation and other estimations can be done separately here. This job is not necessary as JaBbA will do its own estimation, but it could help with a few fringe cases.
The task is: ~/tasks/ascat_seg.task
Now that we have all the right inputs, it’s time to run JaBbA. The task is located here: ~/tasks/hg19/core_dryclean/JaBbA.task
and looking into the task:
Flow::Task("~/tasks/hg19/core_dryclean/JaBbA.task")
## No methods found in package 'data.table' for request: 'key' when loading 'Flow'
## #Module JaBbA [Task: JaBbA path: ~/tasks/hg19/core_dryclean/JaBbA.task] ("sh <libdir>/run.sh <RAfile> <CovFile> --verbose --name <TumorName> --seg <SegFile> --field <CovField...")
## ~/modules/JaBbA/
## input CovFile tumor_dryclean_cov path
## input RAfile svaba_somatic_vcf path
## input CovField field value foreground
## input RAsupp svaba_unfiltered_somatic_vcf path /dev/null
## input TierFieldName tfield value tier
## input NormalSegFile cbs_nseg_rds path /dev/null
## input SegFile cbs_seg_rds path /dev/null
## input SlackPenalty slack value 20
## input OptionalHetPileupOutput het_pileups_wgs path /dev/null
## input Purity purity value NA
## input Ploidy ploidy value NA
## input tilim tilim value 9000
## input epgap epgap value 0.1
## input ppmethod pp.method value sequenza
## input maxna maxna value -1
## input blacklist.coverage blacklist.coverage path /gpfs/commons/groups/imielinski_lab/DB/JaBbA/hg19.coverage.blacklist.rds
## input blacklist.junctions blacklist.junctions path /dev/null
## input flags flags value
## input NumIterations iter value 0
## input TumorName pair value tumor
## input job.spec.memory "15" value
## input indel indel value exclude
## input lp lp value TRUE
## input ism ism value TRUE
## input cnsignif cnsignif value 0.01
## input mem treemem value 16
## input fix.thres fix.thres value -1
## output jabba_rds jabba.simple.rds$
## output jabba_gg jabba.simple.gg.rds$
## output jabba_vcf jabba.simple.vcf$
## output jabba_raw_rds jabba.raw.rds$
## output opti opt.report.rds$
## output jabba_seg jabba.seg$
## output kag karyograph.rds$
We have bunch of values that will need to be filled. Here are some of the inputs that will need to be altered to:
tumor_dryclean_cov should be tumor_dryclean_covsvaba_somatic_vcf should be svaba_somatic_vcffield should be foreground if you are following the blog filecbs_seg_rds should match cbs outputs from the merge.cbs_nseg_rds should match cbs outrputs from the merge.het_pileups_wgs should be the het_pileup sites.txt output for that file.Purity and ploidy values can be precalculated via the ascat job and inputted here; otherwise the task and JaBbA will estimate these for you.
There are a few other estimated parameters that can be played around with for the fits, and in your free time, I would recommend playing around with these to see how the solver is affected by the various arguments.
From here, run the program, and we should be good to go. Expect each pair sample to take at most a few hours. There will occasionally be cases where the solver in JaBbA fails to converge, and it will become important to play around with parameters to solve for cases like these.
Most post-JaBbA tasks are either related to the sample as a whole or the ggraph structure generated by the JaBbA job. Here we will go into a simple post JaBbA task which is “event calling” via gGnome. We run the task ~/tasks/Events.task which calls the event caller from gGnome. From here the caller uses set thresholds and criteria to identify the type of classes and events are found within each sample.
Most of these jobs like Fusions, Alterations, and Events are quick, so we can expect a reasonably fast turnaround.
Finally, plotting is done via gGnome. Let’s try and plot some of our JaBbA outputs.
Use the gGnome tutorial for reference: Link
Using the Complex JaBbA output from the Events job:
Plot a basic SV event via gGraph.
Tweak the parameters around the range of the event to increase by 10K bp on each side
Generate a gWalk of one of these SV events.
Plot and save these plots via ppng or ppdf on mskiweb.
Read into these in your free time!
101 on Flow Usage: http://mskiweb/101/flow.html
Links to Tools within the pipeline: